Sopoken Term Detection Based on a Syllable N-gram Index at the NTCIR-11 SpokenQuery&Doc Task

نویسندگان

  • Nagisa Sakamoto
  • Kazumasa Yamamoto
  • Seiichi Nakagawa
چکیده

For spoken term detection, it is crucial to consider out-ofvocabulary (OOV) and the mis-recognition of spoken words. Therefore, various sub-word unit based recognition and retrieval methods have been proposed. We also proposed a distant n-gram indexing/retrieval method for spoken queries, which is based on a syllable n-gram and incorporates a distance metric in a syllable lattice. The distance represents confidence score of the syllable n-gram assumed the recognition error such as substitution error, insertion error and deletion error. To address spoken queries, we propose a combination of candidates obtained through some ASR systems which are based on syllable or word units. We run some experiments on the NTCIR-11 SpokenQuery&Doc Task and report the evaluation results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Document Retrieval Experiments for SpokenQuery&Doc at Ryukoku University (RYSDT)

In this paper, we describe spoken document retrieval (SDR) systems in Ryukoku University, which were participated in NTCIR-11 “SpokenQuery&Doc” task. In NTCIR-11 SpokenQuery&Doc task, there are subtasks: “spoken content retrieval (SCR) subtask” and “spoken term detection (STD) subtask”. We participated in the SCR and STD subtasks as team RYSDT. In this paper, our SDR and STD systems are described.

متن کامل

Overview of the NTCIR-11 SpokenQuery&Doc Task

This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery&Doc) task at the NTCIR-11Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) as the main sub-task. With a spoken query driven spoken term detection task (SQSTD) as an additional sub-task. The paper describes details of each sub-task, the data used, the creation of the sp...

متن کامل

STD Method Based on Hash Function for NTCIR11 SpokenQuery&Doc Task

In this paper, we describe a spoken term detection (STD) method which is used in Spoken Query and Documents task of NTCIR-11 meeting. Our STDmethod extracts sub-sequences from the syllable-based speech recognition candidates of the target speech and converts them into bit sequences using a hash function. The query is also converted into a bit sequence in the same way. Term detection candidates ...

متن کامل

STD Score Combination with Acoustic Likelihood and Robust SCR Models for False Positives: Experiments at NTCIR-11 SpokenQuery&Doc

In this paper, we report our experiments at NTCIR-11 SpokenQuery&Doc task [1]. We participated both the STD and SCR subtasks of SpokenDoc. For STD subtask, We try to improve detection accuracy by combining the DTW distance between syllable sequences and the acoustic likelihood of the detected speech segment. The final combined score, which is obtained by applying logistic regression on the, was...

متن کامل

Overview of the NTCIR-12 SpokenQuery&Doc-2 Task

This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery&Doc-2) task at the NTCIR-12 Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) and a spoken query driven spoken term detection (SQ-STD) as the two subtasks. The paper describes details of each sub-task, the data used, the creation of the speech recognition systems used ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014